186 research outputs found

    Adaptive, fast walking in a biped robot under neuronal control and learning

    Get PDF
    Human walking is a dynamic, partly self-stabilizing process relying on the interaction of the biomechanical design with its neuronal control. The coordination of this process is a very difficult problem, and it has been suggested that it involves a hierarchy of levels, where the lower ones, e.g., interactions between muscles and the spinal cord, are largely autonomous, and where higher level control (e.g., cortical) arises only pointwise, as needed. This requires an architecture of several nested, sensori–motor loops where the walking process provides feedback signals to the walker's sensory systems, which can be used to coordinate its movements. To complicate the situation, at a maximal walking speed of more than four leg-lengths per second, the cycle period available to coordinate all these loops is rather short. In this study we present a planar biped robot, which uses the design principle of nested loops to combine the self-stabilizing properties of its biomechanical design with several levels of neuronal control. Specifically, we show how to adapt control by including online learning mechanisms based on simulated synaptic plasticity. This robot can walk with a high speed (>3.0 leg length/s), self-adapting to minor disturbances, and reacting in a robust way to abruptly induced gait changes. At the same time, it can learn walking on different terrains, requiring only few learning experiences. This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks

    Cluster update and recognition

    Full text link
    We present a fast and robust cluster update algorithm that is especially efficient in implementing the task of image segmentation using the method of superparamagnetic clustering. We apply it to a Potts model with spin interactions that are are defined by gray-scale differences within the image. Motivated by biological systems, we introduce the concept of neural inhibition to the Potts model realization of the segmentation problem. Including the inhibition term in the Hamiltonian results in enhanced contrast and thereby significantly improves segmentation quality. As a second benefit we can - after equilibration - directly identify the image segments as the clusters formed by the clustering algorithm. To construct a new spin configuration the algorithm performs the standard steps of (1) forming clusters and of (2) updating the spins in a cluster simultaneously. As opposed to standard algorithms, however, we share the interaction energy between the two steps. Thus the update probabilities are not independent of the interaction energies. As a consequence, we observe an acceleration of the relaxation by a factor of 10 compared to the Swendson and Wang procedure.Comment: 4 pages, 2 figure

    Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

    Get PDF
    Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult

    Reinforcement learning or active inference?

    Get PDF
    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain

    Coverage, Continuity and Visual Cortical Architecture

    Get PDF
    The primary visual cortex of many mammals contains a continuous representation of visual space, with a roughly repetitive aperiodic map of orientation preferences superimposed. It was recently found that orientation preference maps (OPMs) obey statistical laws which are apparently invariant among species widely separated in eutherian evolution. Here, we examine whether one of the most prominent models for the optimization of cortical maps, the elastic net (EN) model, can reproduce this common design. The EN model generates representations which optimally trade of stimulus space coverage and map continuity. While this model has been used in numerous studies, no analytical results about the precise layout of the predicted OPMs have been obtained so far. We present a mathematical approach to analytically calculate the cortical representations predicted by the EN model for the joint mapping of stimulus position and orientation. We find that in all previously studied regimes, predicted OPM layouts are perfectly periodic. An unbiased search through the EN parameter space identifies a novel regime of aperiodic OPMs with pinwheel densities lower than found in experiments. In an extreme limit, aperiodic OPMs quantitatively resembling experimental observations emerge. Stabilization of these layouts results from strong nonlocal interactions rather than from a coverage-continuity-compromise. Our results demonstrate that optimization models for stimulus representations dominated by nonlocal suppressive interactions are in principle capable of correctly predicting the common OPM design. They question that visual cortical feature representations can be explained by a coverage-continuity-compromise.Comment: 100 pages, including an Appendix, 21 + 7 figure

    Motion processing with wide-field neurons in the retino-tecto-rotundal pathway

    Get PDF
    The retino-tecto-rotundal pathway is the main visual pathway in non-mammalian vertebrates and has been found to be highly involved in visual processing. Despite the extensive receptive fields of tectal and rotundal wide-field neurons, pattern discrimination tasks suggest a system with high spatial resolution. In this paper, we address the problem of how global processing performed by motion-sensitive wide-field neurons can be brought into agreement with the concept of a local analysis of visual stimuli. As a solution to this problem, we propose a firing-rate model of the retino-tecto-rotundal pathway which describes how spatiotemporal information can be organized and retained by tectal and rotundal wide-field neurons while processing Fourier-based motion in absence of periodic receptive-field structures. The model incorporates anatomical and electrophysiological experimental data on tectal and rotundal neurons, and the basic response characteristics of tectal and rotundal neurons to moving stimuli are captured by the model cells. We show that local velocity estimates may be derived from rotundal-cell responses via superposition in a subsequent processing step. Experimentally testable predictions which are both specific and characteristic to the model are provided. Thus, a conclusive explanation can be given of how the retino-tecto-rotundal pathway enables the animal to detect and localize moving objects or to estimate its self-motion parameters

    Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

    Get PDF
    A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications
    corecore